1
00:00:00,790 --> 00:00:07,320
[Music]

2
00:00:11,840 --> 00:00:09,070
[Applause]

3
00:00:14,660 --> 00:00:11,850
thank you for the invitation today I'm

4
00:00:17,600 --> 00:00:14,670
going to tell you about a project that's

5
00:00:19,310 --> 00:00:17,610
a collaboration between my lab including

6
00:00:22,160 --> 00:00:19,320
Selena Blanco some of you have met her

7
00:00:26,269 --> 00:00:22,170
here she's here at apps icon as well as

8
00:00:28,030 --> 00:00:26,279
Robert Pascal's lab and we have also

9
00:00:30,229 --> 00:00:28,040
essential input from Ellie Muller and

10
00:00:32,210 --> 00:00:30,239
Jerry Joyce so I'm going to tell you

11
00:00:34,910 --> 00:00:32,220
about this project which was published a

12
00:00:39,319 --> 00:00:37,220
okay so we are interested in the RNA

13
00:00:42,560 --> 00:00:39,329
world and in evolution of the RNA world

14
00:00:44,450 --> 00:00:42,570
so for those of you who haven't thought

15
00:00:46,460 --> 00:00:44,460
about the concept of Fitness landscapes

16
00:00:49,790 --> 00:00:46,470
before let me just explain to you what

17
00:00:52,220 --> 00:00:49,800
we mean so suppose you have a list of

18
00:00:55,580 --> 00:00:52,230
all the possible sequences so 4 to the N

19
00:00:58,819 --> 00:00:55,590
in our case n is 2121 random nucleotides

20
00:01:00,980 --> 00:00:58,829
you write them down in some fashion C on

21
00:01:02,900 --> 00:01:00,990
a sequence coordinate if you know their

22
00:01:04,969 --> 00:01:02,910
Fitness let if their ribozymes maybe you

23
00:01:07,490 --> 00:01:04,979
know their catalytic activity then you

24
00:01:09,920 --> 00:01:07,500
can write down this function in sequence

25
00:01:12,319 --> 00:01:09,930
space and this function is known as the

26
00:01:14,480 --> 00:01:12,329
fitness landscape on this landscape the

27
00:01:16,700 --> 00:01:14,490
peaks correspond to families of active

28
00:01:20,059 --> 00:01:16,710
molecules these troughs correspond to

29
00:01:22,370 --> 00:01:20,069
valleys of inactive molecules so natural

30
00:01:25,580 --> 00:01:22,380
selection can be thought of rigorously

31
00:01:27,830 --> 00:01:25,590
as a random walk over this landscape so

32
00:01:29,719 --> 00:01:27,840
a population will start out in some area

33
00:01:32,929 --> 00:01:29,729
of sequence space exploring the local

34
00:01:34,760 --> 00:01:32,939
area through mutations if a mutant

35
00:01:36,679 --> 00:01:34,770
arises it has higher Fitness than the

36
00:01:39,739 --> 00:01:36,689
population as a whole moves upwards and

37
00:01:42,379 --> 00:01:39,749
so the population kind of climbs up the

38
00:01:44,989 --> 00:01:42,389
fitness landscape so we think of it as a

39
00:01:48,830 --> 00:01:44,999
random walk with a bias toward climbing

40
00:01:50,359 --> 00:01:48,840
hills so it's looks it looks nice on

41
00:01:51,980 --> 00:01:50,369
this cartoon but we actually have very

42
00:01:54,349 --> 00:01:51,990
little idea of what real Fitness

43
00:01:56,599 --> 00:01:54,359
landscapes look like and we have some

44
00:01:58,760 --> 00:01:56,609
ideas from work exploring relatively

45
00:02:00,440 --> 00:01:58,770
local areas of Fitness landscapes

46
00:02:01,969 --> 00:02:00,450
looking around a known ribozyme for

47
00:02:05,149 --> 00:02:01,979
example in the evolutionary pathways

48
00:02:07,399 --> 00:02:05,159
around a single known ribozyme but what

49
00:02:09,260 --> 00:02:07,409
my lab is interested in doing is mapping

50
00:02:11,210 --> 00:02:09,270
this entire fitness landscape for the

51
00:02:14,500 --> 00:02:11,220
entire space so of course we're limited

52
00:02:17,270 --> 00:02:14,510
by experimental

53
00:02:20,149 --> 00:02:17,280
sizes so we can only look at relatively

54
00:02:23,539 --> 00:02:20,159
short sequences fortunately for RNA even

55
00:02:25,729 --> 00:02:23,549
relatively short sequences 21 MERS or at

56
00:02:28,240 --> 00:02:25,739
least sequences with a random region of

57
00:02:30,830 --> 00:02:28,250
21 in length can still be functional so

58
00:02:34,339 --> 00:02:30,840
our experimental approach is based on in

59
00:02:37,699 --> 00:02:34,349
vitro evolution so we take a pool of

60
00:02:40,160 --> 00:02:37,709
random sequences many copies of each

61
00:02:43,399 --> 00:02:40,170
possible sequence and then we subject

62
00:02:45,140 --> 00:02:43,409
this to a biochemical selection for the

63
00:02:47,479 --> 00:02:45,150
ribozyme activity we're interested in

64
00:02:50,059 --> 00:02:47,489
the ideas that will reduce the frequency

65
00:02:52,039 --> 00:02:50,069
of sequences that lack this activity and

66
00:02:54,440 --> 00:02:52,049
enriched for sequences that have this

67
00:02:58,069 --> 00:02:54,450
activity and in this way we can pick out

68
00:03:00,400 --> 00:02:58,079
the active sequences okay what are we

69
00:03:02,780 --> 00:03:00,410
interested in in terms of activities

70
00:03:05,930 --> 00:03:02,790
we're curious about the genetic code

71
00:03:08,720 --> 00:03:05,940
like many of us here and there's a lot

72
00:03:10,430 --> 00:03:08,730
of focus on the ribosome and justly so

73
00:03:13,009 --> 00:03:10,440
but the other part of the genetic code

74
00:03:15,589 --> 00:03:13,019
is the hooking up of the proper amino

75
00:03:18,500 --> 00:03:15,599
acids to the property RNAs and sometimes

76
00:03:20,300 --> 00:03:18,510
that's called the second genetic code so

77
00:03:23,030 --> 00:03:20,310
in modern biology this is done by

78
00:03:26,360 --> 00:03:23,040
synthesis the aminoacyl trna synthetases

79
00:03:28,280 --> 00:03:26,370
michael yaris many years ago showed that

80
00:03:31,900 --> 00:03:28,290
it's possible to find ribozymes that can

81
00:03:36,379 --> 00:03:31,910
catalyze this reaction so ribozymes that

82
00:03:38,629 --> 00:03:36,389
attached amino acids to RNA however

83
00:03:39,979 --> 00:03:38,639
before we jumped into the selection we

84
00:03:43,159 --> 00:03:39,989
noticed that there are some problems

85
00:03:46,809 --> 00:03:43,169
which are experimentally difficult to

86
00:03:49,400 --> 00:03:46,819
deal with this amino acyl adenylate the

87
00:03:52,280 --> 00:03:49,410
reactive substrate is highly reactive in

88
00:03:54,349 --> 00:03:52,290
water so it's not thought to be

89
00:03:55,789 --> 00:03:54,359
prebiotic lee plausible because of this

90
00:03:57,849 --> 00:03:55,799
and it's also from a practical

91
00:04:00,559 --> 00:03:57,859
standpoint difficult to use in the lab

92
00:04:03,229 --> 00:04:00,569
so we look to our organic chemistry

93
00:04:05,150 --> 00:04:03,239
colleagues coa lu working in Robert

94
00:04:07,789 --> 00:04:05,160
Pascal's Lau lab at the University of

95
00:04:11,330 --> 00:04:07,799
Montpellier and they had come up with

96
00:04:13,909 --> 00:04:11,340
this form of activated amino acids the

97
00:04:15,619 --> 00:04:13,919
five for H ox as Alone's these are

98
00:04:19,159 --> 00:04:15,629
related to what might be more familiar

99
00:04:20,800 --> 00:04:19,169
to you the and carboxy on hydrates you

100
00:04:23,360 --> 00:04:20,810
can see looking at the structure the

101
00:04:25,640 --> 00:04:23,370
part that will become the amine and the

102
00:04:25,990 --> 00:04:25,650
carboxylate and the side chain of the

103
00:04:28,930 --> 00:04:26,000
amino

104
00:04:31,660 --> 00:04:28,940
said this pending grip here is

105
00:04:33,550 --> 00:04:31,670
convenient because we we ziwei can put a

106
00:04:38,050 --> 00:04:33,560
biotin on there and that way we have

107
00:04:40,330 --> 00:04:38,060
this capture handle for reactions ziwei

108
00:04:41,950 --> 00:04:40,340
had also noticed that these ox as loans

109
00:04:45,280 --> 00:04:41,960
which are pre radically plausible

110
00:04:48,490 --> 00:04:45,290
they're formed from simulated volcanic

111
00:04:50,890 --> 00:04:48,500
mixtures these ox as loans do react

112
00:04:54,360 --> 00:04:50,900
slowly with nucleotides so in this case

113
00:04:57,850 --> 00:04:54,370
I'm showing a modified tyrosine which is

114
00:04:59,950 --> 00:04:57,860
very slowly reactive with nucleotides so

115
00:05:01,050 --> 00:04:59,960
it's a perfect setup for a selection we

116
00:05:04,360 --> 00:05:01,060
know the reaction is thermodynamically

117
00:05:09,580 --> 00:05:04,370
downhill we want to find a ribozyme that

118
00:05:11,470 --> 00:05:09,590
catalyzes this reaction so the selection

119
00:05:14,080 --> 00:05:11,480
scheme that was devised by a Pressman in

120
00:05:16,390 --> 00:05:14,090
my lab started with the DNA pool

121
00:05:16,810 --> 00:05:16,400
covering sequence space transcribed into

122
00:05:21,340 --> 00:05:16,820
RNA

123
00:05:24,580 --> 00:05:21,350
we're interested in but a few of them

124
00:05:26,650 --> 00:05:24,590
will react with our substrate if they

125
00:05:27,700 --> 00:05:26,660
react they become biotinylated by the

126
00:05:30,640 --> 00:05:27,710
way this is biotin

127
00:05:32,530 --> 00:05:30,650
so these biotin elated RNAs can then be

128
00:05:35,530 --> 00:05:32,540
captured by streptavidin bead pull down

129
00:05:37,810 --> 00:05:35,540
and then we can amplify these by rt-pcr

130
00:05:43,570 --> 00:05:37,820
do high-throughput sequencing follow the

131
00:05:47,230 --> 00:05:43,580
fate of these sequences over time so

132
00:05:49,120 --> 00:05:47,240
that process gives you a list of active

133
00:05:51,190 --> 00:05:49,130
sequences so think of this as a list of

134
00:05:53,110 --> 00:05:51,200
ribozyme sequences which you've kind of

135
00:05:56,230 --> 00:05:53,120
culled from the space of all possible

136
00:05:57,670 --> 00:05:56,240
sequences then our next step that what

137
00:05:59,860 --> 00:05:57,680
we really want to do is associate a

138
00:06:02,320 --> 00:05:59,870
catalytic activity a rate constant let's

139
00:06:04,420 --> 00:06:02,330
say with each of those sequences so to

140
00:06:07,180 --> 00:06:04,430
do this we came up with a method with

141
00:06:09,610 --> 00:06:07,190
input from Willie and Jerry called

142
00:06:12,730 --> 00:06:09,620
kinetic sequencing the idea is that we

143
00:06:15,280 --> 00:06:12,740
take a RNA pool which has medium

144
00:06:17,530 --> 00:06:15,290
diversity it's not so diverse that we

145
00:06:19,630 --> 00:06:17,540
can't get much information about any

146
00:06:21,760 --> 00:06:19,640
particular sequence but it the pool has

147
00:06:23,740 --> 00:06:21,770
not converged so much also to the point

148
00:06:26,050 --> 00:06:23,750
where we only can look at a small number

149
00:06:28,750 --> 00:06:26,060
of sequences so it's a medium diversity

150
00:06:29,980 --> 00:06:28,760
pool we split it into several elec watts

151
00:06:32,980 --> 00:06:29,990
and react it with different

152
00:06:34,300 --> 00:06:32,990
concentrations of our substrate you

153
00:06:36,240 --> 00:06:34,310
could also think of doing this with

154
00:06:38,650 --> 00:06:36,250
different time points but substrate

155
00:06:39,800 --> 00:06:38,660
concentrations were easier for us for

156
00:06:42,740 --> 00:06:39,810
technical reasons

157
00:06:45,080 --> 00:06:42,750
and then we react them and capture them

158
00:06:47,990 --> 00:06:45,090
capture the reacted molecules sequence

159
00:06:50,750 --> 00:06:48,000
those captured molecules and then that

160
00:06:53,240 --> 00:06:50,760
gives us basically four points along

161
00:06:55,310 --> 00:06:53,250
this kinetic curve for different

162
00:06:57,070 --> 00:06:55,320
concentrations and that allows us to

163
00:06:59,510 --> 00:06:57,080
calculate the rate constant for these

164
00:07:00,980 --> 00:06:59,520
sequences and depending on how deeply

165
00:07:03,500 --> 00:07:00,990
you sequence you could get information

166
00:07:07,310 --> 00:07:03,510
for thousands to perhaps hundreds of

167
00:07:09,520 --> 00:07:07,320
thousands of molecules this kinetics

168
00:07:12,409 --> 00:07:09,530
sequencing scheme works pretty well at

169
00:07:14,840 --> 00:07:12,419
predicting the activity here's a

170
00:07:18,200 --> 00:07:14,850
correlation between the measurement by

171
00:07:21,560 --> 00:07:18,210
KC kinetic sequencing and a traditional

172
00:07:23,150 --> 00:07:21,570
gel shift assay okay

173
00:07:24,800 --> 00:07:23,160
so now we have this list of ribozyme

174
00:07:27,230 --> 00:07:24,810
sequences and their associated rate

175
00:07:28,880 --> 00:07:27,240
constants what we want to do is answer

176
00:07:31,760 --> 00:07:28,890
some interesting questions about this

177
00:07:33,890 --> 00:07:31,770
fitness landscape one interesting

178
00:07:35,330 --> 00:07:33,900
question is how smooth is this landscape

179
00:07:38,510 --> 00:07:35,340
and the reason why this is interesting

180
00:07:39,920 --> 00:07:38,520
is if you had a very smooth landscape

181
00:07:42,680 --> 00:07:39,930
for this kind of an extreme example a

182
00:07:44,330 --> 00:07:42,690
single peak very smooth landscape then

183
00:07:47,000 --> 00:07:44,340
you can imagine no matter where you

184
00:07:51,710 --> 00:07:47,010
start on this landscape you can the

185
00:07:54,500 --> 00:07:51,720
population can feel the the peak so you

186
00:07:56,750 --> 00:07:54,510
can do a very smooth walk up to the top

187
00:07:59,690 --> 00:07:56,760
and optimize find the global optimum

188
00:08:01,430 --> 00:07:59,700
over sequence space on the other hand if

189
00:08:04,219 --> 00:08:01,440
you have a highly rugged landscape like

190
00:08:05,930 --> 00:08:04,229
this then kind of depending on where you

191
00:08:08,779 --> 00:08:05,940
start in the sequence pace you might

192
00:08:11,420 --> 00:08:08,789
make a local climb to what you think is

193
00:08:13,820 --> 00:08:11,430
the top of ap or what is the top of a

194
00:08:16,370 --> 00:08:13,830
peak but it you you've missed the global

195
00:08:18,409 --> 00:08:16,380
optimum so we were very interested in

196
00:08:21,200 --> 00:08:18,419
understanding these potential

197
00:08:23,659 --> 00:08:21,210
evolutionary pathways here's an example

198
00:08:28,310 --> 00:08:23,669
of the type of pathways that we find so

199
00:08:30,350 --> 00:08:28,320
over here is one motif motif one of the

200
00:08:33,440 --> 00:08:30,360
ribozymes that we found ribozyme 1b

201
00:08:37,219 --> 00:08:33,450
marked in blue over here motif 2 over

202
00:08:40,190 --> 00:08:37,229
here you can see motif sequence 2.1 is

203
00:08:42,589 --> 00:08:40,200
the global optimum of the landscape so

204
00:08:44,570 --> 00:08:42,599
what we can observe is that looking at

205
00:08:46,760 --> 00:08:44,580
the activity of these sequences which we

206
00:08:48,920 --> 00:08:46,770
measured by kasich and the evolutionary

207
00:08:51,590 --> 00:08:48,930
distances we can find pathways that

208
00:08:53,780 --> 00:08:51,600
connect relatively related sequences you

209
00:08:57,380 --> 00:08:53,790
could call this motif 1b

210
00:08:59,390 --> 00:08:57,390
and one a local Optima which are

211
00:09:02,810 --> 00:08:59,400
connected by reasonable evolutionary

212
00:09:05,900 --> 00:09:02,820
pathways but going from this motif one

213
00:09:08,900 --> 00:09:05,910
area to motif two you have to go through

214
00:09:10,550 --> 00:09:08,910
the large valley of sequences that have

215
00:09:13,600 --> 00:09:10,560
basically no activity or at least

216
00:09:16,910 --> 00:09:13,610
baseline activity so this tells us that

217
00:09:19,640 --> 00:09:16,920
only nearby Peaks are connected by

218
00:09:22,850 --> 00:09:19,650
viable pathways if we zoom out to the

219
00:09:24,680 --> 00:09:22,860
entire landscape so I'm not showing all

220
00:09:25,910 --> 00:09:24,690
the different ribozymes that we found

221
00:09:27,980 --> 00:09:25,920
there would be thousands of points if

222
00:09:31,430 --> 00:09:27,990
that were the case but what I'm showing

223
00:09:33,620 --> 00:09:31,440
you is the top motifs that we found

224
00:09:36,590 --> 00:09:33,630
which are shown by these big circles and

225
00:09:41,960 --> 00:09:36,600
the best five pathways that we found

226
00:09:45,200 --> 00:09:41,970
between these major ribozyme centers so

227
00:09:46,000 --> 00:09:45,210
what you can see is sequence 2.1 high

228
00:09:51,830 --> 00:09:46,010
activity

229
00:09:54,440 --> 00:09:51,840
kind of low-lying plateau I would say of

230
00:09:56,210 --> 00:09:54,450
motif one where there might be some

231
00:09:57,920 --> 00:09:56,220
reasonable interconnections but you have

232
00:10:00,500 --> 00:09:57,930
to look a little bit hard for them but

233
00:10:02,570 --> 00:10:00,510
they exist but motif two in particular

234
00:10:05,060 --> 00:10:02,580
is kind of cut off from the rest of this

235
00:10:07,040 --> 00:10:05,070
landscape by multiple mutations these

236
00:10:10,040 --> 00:10:07,050
dotted lines indicate multiple mutations

237
00:10:13,580 --> 00:10:10,050
required multiple mutations required at

238
00:10:18,290 --> 00:10:13,590
essentially no activity to traverse this

239
00:10:20,150 --> 00:10:18,300
part of the landscape so with that I'd

240
00:10:23,090 --> 00:10:20,160
like to just close by acknowledging the

241
00:10:25,610 --> 00:10:23,100
people involved in this work Abe who led

242
00:10:27,830 --> 00:10:25,620
this project and a celiac who

243
00:10:30,530 --> 00:10:27,840
contributed to a recognized analysis I

244
00:10:32,870 --> 00:10:30,540
didn't have time to talk about and Evan

245
00:10:34,700 --> 00:10:32,880
can contribute to some experimental

246
00:10:37,100 --> 00:10:34,710
validation which I also didn't have time

247
00:10:39,590 --> 00:10:37,110
to talk about and I'd like to again take

248
00:10:45,090 --> 00:10:39,600
our collaborators ziwei and Ribera and

249
00:10:56,519 --> 00:10:49,350
I think we have time for one question

250
00:10:58,889 --> 00:10:56,529
that was a really interesting talk thank

251
00:11:00,389 --> 00:10:58,899
you I just wondered in in terms of

252
00:11:02,040 --> 00:11:00,399
building up her evolutionary landscape

253
00:11:03,840 --> 00:11:02,050
whether you're going to build in looking

254
00:11:05,910 --> 00:11:03,850
at the possible role of mobile genetic

255
00:11:08,220 --> 00:11:05,920
elements which could potentially cause a

256
00:11:11,009 --> 00:11:08,230
discontinuity in evolution that could

257
00:11:13,680 --> 00:11:11,019
bridge some of those gaps okay yes so we

258
00:11:16,620 --> 00:11:13,690
completely I really oversimplified this

259
00:11:20,280 --> 00:11:16,630
as picture saying that we're just

260
00:11:22,920 --> 00:11:20,290
looking at the step by step mutations we

261
00:11:25,590 --> 00:11:22,930
have in our pathfinding algorithm you

262
00:11:27,540 --> 00:11:25,600
can allow larger jumps in sequence space

263
00:11:29,939 --> 00:11:27,550
and you get pretty much the same picture

264
00:11:34,050 --> 00:11:29,949
if you allow up to four mutations but if

265
00:11:35,730 --> 00:11:34,060
you allow quite large gaps then you can

266
00:11:40,650 --> 00:11:35,740
start to see kind of an it completely